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ABSTRACT 

The present monte carlo simulation study was designed 
to examine the effects of multidimensionality during the 
administration of computerized adaptive testing (CAT). It was assumed 
that multidimensionality existed in the individuals to whom test 
items were being administered, i.e., that the correct or incorrect 
responses given by an individual were generated from a specified 
multidimensional structure, rather than the unidimensional item 
response theory (IRT) model normally assumed to have generated the 
observable dichotomous test item responses. The dichotomous response 
was then treated for CAT item selection and ability estimation 
purposes as if it had been generated by the unidimensional model. To 
the extent that the observed item response was affected by dimensions 
other than the first (which corresponded to the single dimension 
assumed to underlie the item selection and ability estimation 
process) errors should be introduced into the adaptive testing 
process. These errors should affect the ability estimates and the 
efficiency of CAT. The study focused on the nature and degree of 
these errors under a variety of multidimensional structures, to 
determine how robust CAT is to the effects of multidimensionality in 
examinees • responses to test items. (PN) 
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Before computerized adaptive testing (CAT) can be applied In various opera- 
tional settings, Its characteristics must be evaluated under a variety of condi- 
tions. Studies of the reliability and validity of CAT (e. g« , Johnson & Weiss, 
1980; Kiely, Zara, & Weiss, 1983; McBrlde & Martin, 1983; Moreno, Wetzel, 
McBrlde & Weiss, 1984; Sympson, Weiss, & Ree, 1982) provide Important Informa- 
tion comparing CAT to conventional tests In applied situations* Live-testing 
studies such as these, lowever, are expensive and time-consuming, provide re- 
sults that are dependent on the characteristics of the sample of subjects and 
the specific criterion variables used, and do not permit an answer to the Impor- 
tant questions about how well CAT measures true ability levels and whether abil- 
ity Is better estimated at different ability levels* Llve-te':*:lng studies also 
Incorporate a number of uncontrolled sources of error (e. g*. Item parameter 
estimation error, various errors of measurement due to Idiosyncratic character- 
istics of examinee responses to test Items) which further complicate the process 
of reaching generallzable conclusions. 

Monte carlo simulation provides a means of systematically examining the 
performance of CAT under a variety of conditions and of identifying the effects 
of various kinds of errors on the performance of CAT strategies. Early studies 
were concerned with the comparison of CAT item selection strategies with conven- 
tional tests (e*g*, Betz & Weiss, 1973, 1974, 1975; Larkxn & Weiss, 1974; Vale & 
Weiss, 1975a, 1975b) and with each other (e*g., Larkin & Weiss, 1975)* These 
studies provided global evaluations of CAT strategies that were useful in elimi- 
nating some strategies from further consideration* Later studies then concen- 
trated on the more promising strategies, generally those that are based on item 
response theory (IRT), examining the performance of these testing strategies 
conditional on true ability levels (e*g*, McBrlde, 1977; Vale, 1975; Weiss & 
McBrlde, 1984)* 

One factor that can affect the performance of CAT is the nature of the item 
pool from which it draws items. McBrlde (1977; Weiss & McBrlde, 1984) studied 
the performance of a Bayesian CAT in perfect and ideal item pools and in realis- 
tic item pools in which the IRT diffficulty and discr .^.mlnatlon parameters were 
correlated* Others (e*g*; Urry, 1974) also examined CAT performance in a 
variety of item pool configurations* 

In addition to the distributions of item difficulties and discriminations 
in a given item pool, the degree of error in the IRT item parameter estimates in 
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a real item pool can affect the performance of CAT, particularly since Items are 
selected on the basis of their IRT parameter estimates. Crlchton (1981) Inves- 
tigated the effects of errors In Item parameter estimates on the performance of 
maximum Information and Bayeslan CAT strategies In the context of the 3-parame- 
ter logistic model. Mattson (1983) extended Crlchton 's study to the 1- and 
2-parameter logistic models, both Bayeslan and maximum likelihood scoring, and 
to the more realistic situation in which the IRT difficulty and discr imiaatlon 
parameters had varying degrees of correlation. These later studies provide val- 
uable information about the performance of CAT under the realistic situation in 
which adaptive testing is to be done using item pools with parameters estimated 
with varying degrees of error. 

A second factor that is likely to have an effect on the performance of IRT- 
based CAT is raultidimensionality. Operational IRT models used for CAT assume 
that unidimensionality exists at two stages: (1) in the process of estimating 
item parameters, and (2) in the process by which an individual generates a re- 
sponse to a test item with given item parameters. Presumably, any deviations 
from unidimensionality that exist at either of those stages in CAT could result 
in non-optimal performance of IRT-based CAT strategies. 

While many tests of ability and achievement approximate unidimensionality, 
none have shown the strict unidimensionality required by operational IRT models. 
This motivated Drasgow and Parsons (1983) to examine the effects of deviations 
from unidimensionality during the item parameter estimation process on IRT item 
parameter estimates. 

Purpose 

The present monte carlo simulation study was designed to examine the 
effects of multidimensionallty during CAT test administration. It was assumed 
thai: multidimensionallty existed in the individuals to whom test items were be- 
ing administered— i.e. , that the correct or Incorrect responses given by an in- 
dividual were generated from a specified multidimensional structure, rather than 
the unidimensional IRT model normally assumed to have generated the observable 
dichotomous test item responses. The dichotomous response was then treated for 
CAT item selection and ability estimation purposes as if it had been generated 
by the unidimensional model. To the extent that the observed item response was 
affected by dimensions other than the first (which corresponded to the single 
dimension assumed to underlie the item selection and ability estimation process) 
errors should be introduced into the adaptive testing process. These errors 
should affect the ability estimates and the efficiency of CAT. The study fo- 
cused on the nature and degree of these errors under a variety of multidimen- 
sional structures, to determine how robust CAT is to the effects of multidimen- 
sionallty in examinees' responses to test items. 



METHOD 

Initial Factor Analyses 

Item response vectors for forms 8A and 8B of the Armed Services Vocational 
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Aptitude Battery (ASVAB) were obtained for a sample of military recruits. For 
those subtests of the ASVAB (Mathematics Knowledge, General Science and Mechani- 
cal Comprehension) In which forms 8A and 8b were Identical except for the order 
of the Items, the response vectors for form 8B were rearranged to match the or- 
der of the Items In form 8k. This resulted In datasets with sample sizes of 
5,127 for these three subtests, sample sizes of 2,621 for form 8A of the other 
seven subtests, and sample sizes of 2,506 for form 8B of the other seven sub- 
tests. 

Tetrachorlc Inter-Item correlations were computed for eight of the ten sub- 
tests; the Numerical Operations and Coding Speed subtests were not Included In 
further analyses due to the speeded nature of these subtests. The tetrachorlc 
correlations for the other eight subtests were then factor-analyzed using a 
principal axes factor extraction method and a Varlmax rotation. Of the result- 
ing factor structures, the factor structure of the General Science subtest ex- 
hibited the greatest degree of multldlmensionality. Tsble 1 lists the factor 
loadings on the first four factors for the items in this subtest. This factor 
structure was used as the model for generating subsequent factor structures with 
varying degrees of raultldlmenslonallty. 

Generation of Factor Structures 

The first step in creating factor structures with varying degrees of multi- 
dimensionality was to round the 23 factor loadings on the first factor of the 
ACVAB General Science (GS) subtest to the nearest multiple of .05. This set of 
25 rounded factor loadings was then repeated six times to create a set of factor 
loadings for 150 items on one factor with the same configuration of loadings as 
the first factor for the ASVAB GS subtest. This factor, the original strength 
ASVAB factor (OSAF), was used as" the basis for one of three sets of factor 
structures* 

Sixteen factor structures of varying dimensionality were constructed using 
OSAF as the first factor. Factors other than the first factor were constructed 
to be proportional in strength to the first factor. These sixteen factor struc- 
tures are described in Table 2. Factor structures varied from a 2-factor struc- 
ture with the second factor i/8 as strong as the first factor (Dataset 2) to a 
3-factor structure with Factors 2 and 3 equal in strength to Factor 1 (Dataset 
16). An additional dataset (17) consisted of the actual 4-factor ASVAB GS fac- 
tor solution. 

The 150 factor loadings on OSAF were then increased to yield a first factor 
that was approximately U5 times as strong as OSAF. This new first factor (1.5 
OSAF) was used as the first factor in a set of sixteen different factor struc- 
tures which are also described in Table 2 (Datasets 18-33). Factors other than 
the first factor in Datasets 18-32 were again constructed to be proportional to 
this strengthened first factor in all of the factor structures except the 4-fac- 
tor structure (Dataset 33), where the second, third and fourth factors were the 
actual second, third, and fourth factors from the original factor analysis of 
the ASVAB GS subtest (see Table 1). 

The 150 factor loadings on OSAF were then Increased a second time to result 
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Table 1 



Factor Loadings for the 


First Four 


Factors of the 


ASVAB 




(^f* Tl t*Y* Si 1 


Science Subtest 

1 . . . 




Xtem 










riJ% Ifn A ^ 

nuiDDer 


r aCLU L 1 


Factor 2 


Factor 3 


Factor 4 


1 

X 


• ^ H w 


-.215 


-.250 


.027 


0 


.624 


-.205 


-.018 


-.303 


*j 


.642 


-.201 


-.095 


.026 


A 


4R6 


-.098 


-. 118 


-.115 






-.233 


• 162 


.069 


6 


.703 


-. 160 


.066 


.073 


7 


.572 


.052 


. 103 


-.019 




.493 


-.070 


.072 


-.067 


9 


.546 


-. 174 


-.239 


.119 


1 n 

1 u 


.547 


-.212 


-.015 


.016 


1 1 


.595 


.060 


.009 


-.025 


12 


.398 


.099 


.058 


-.006 




.580 


.096 


-. 120 


-.233 


1 L 


S80 


172 


-.069 


.124 


IS 


.438 


-.029 


.043 


.337 


16 


.543 


.012 


. 100 


. 172 


17 


.462 


.120 


-.030 


-.009 


18 


.639 


.227 


.054 


-.072 


19 


.371 


.208 


.045 


-.011 


20 


.473 


.048 


.132 


-.096 


21 


.460 


.273 


.085 


.006 


22 


.283 


.224 


.115 


-.032 


23 


.480 


.035 


.147 


-.062 


24 


.387 


.650 


-.310 


.067 


25 


.396 


.310 


.101 


.089 


Factor 










Contribution 


7.541 


1.671 


1.030 


K023 



in a first factor that was approximately twice as strong as OSAF. This 
'•♦'rengthened first factor (2.0 OSAF) was used as the first factor in a third set 
of twelve factor structures (Data^?ets 34-45), which are also described in Table 
2. In Datasets 34-44, factors other than the first factor were constructed to 
be proportional in strength to this increased strength first factor; these addi- 
tional factors were also constructed to avoid communalities greater than 1.0 for 
any item. For the 4-factor structure of Dataset 45, the factors other than the 
first factor were taken directly from the original factor analysis of the ASVAB 
GS subtest (see Table 1). 

Generation of Response Vectors 

To evaluate the effect of violation of the assumption of unidimensionality 
in adaptive testing, sets of dichotomous (0,1) item responses were generated 
using the factor structures with varying degrees of multidimensionality. 
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Table 2 

Dataset Numbers for Dat^sets Based on 
First Factors of KO, 1.5, and 2.0 OSAF, 
and Factor Strengths of Fa'^tors 2 through 4, 
for Each of the Datasets 



Factor Strength as a 
Dataset Number Proportion of Factor 1 



l.O 


1.5 


2.0 


Factor 


Factor 


Factor 


OSAF 


OSAF 


OSAF 


2 


3 


4 


1 


18 


34 


- 


- 


- 


2 


19 


35 


1/8 


- 


- 


J 




JO 


1 /A 






4 


21 


37 


1/3 






5 


22 


38 


1/2 






6 


23 


39 


2/3 






7 


24 


40 


3/4 






8 


25 




1.0 






9 


26 


41 


1/8 


1/8 




10 


27 


42 


1/4 


1/4 




11 


28 


43 


1/3 


1/3 




12 


29 


44 


1/2 


1/4 




13 


30 




1/2 


1/2 




14 


31 




2/3 


1/3 




15 


32 




2/3 


2/3 




16 






1.0 


1.0 




17 


33 


45 


GS-2* 


GS-3* 


GS-4* 



*Factor derived from factor analysis of 
ASVAB GS test. 



The first step was to assign 9 levels for each factor to a number of hypo- 
thetical examinees (slmulees). This was done for each factor except the first 
factor by using a random number generator to create uniform distributions of 
1,700 6 values between -3.2 and +3.2 for each factor Independently of all other 
factors. 9 levels for the first factor were assigned so that 100 slmulees were 
assigned to each of 17 6 levels ranging from -3.2 to +3.2 in Increments of .4. 
9 levels for the first factor were assigned in this manner in order to have a 
sufficient number of replications at each 9 level so that indices conditional on 
9 could be computed. 

Next, matrices of item response theory (IRT) item parameters were calculat- 
ed and generated. Item discrimination parameters (as) were computed using the 
following formula: 

where a^^ = item discrimination parameter for item ^ and factor j_, and 
Fgj = factor loading for item £ on factor j_. 
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These matrices of parameters were calculated for each of the 45 factor struc- 
tures. 

Matrices of Item difficulty parameters (bs) were generated for each of the 
45 factor structures using a random number generator which generated a uniform 
distribution of 150 values between -3«2 and +3*2 Independently for each factor 
In a given factor structure. Item pseudo-guesslng parameters (cs) were also 
generated for each factor In the 45 factor structures; they were generated to 
yield a normal distribution of 150 values with a mean of .20 and a standard de- 
viation of .02 for each factor. 

After the Item parameter matrices for each factor structure were deter- 
mined, the probability of a correct response to each item for each factor was 
computed for each of the 1,700 slmulees using the three-parameter logistic mod- 
el, 

where Pj[r,4(9i) ^ probability of a correct response to item on factor for a 
^ simulee with trait level 8j, 

Cgj =» IRT pseud o-&uessing parameter for item £ on factor j^, 

agj - IRT discrimination parameter for item ^ on factor and 

bgj 3 IRT difficulty parameter for item £ on factor j^. 

The probabilities for each item on each factor were then combined using 
Equation 3 to calculate the overall probability of a correct response for each 
individual on each item: 

where r^g « overall probability of a correct response for simulee 1^ on item 
Fgj = factor loading for item £ on factor j^, and 
P. . * probability of a correct response for simulee 1^ on factor for item 

Dichotomows item scores (u^g) were then generated using rj|^g and a random 

number generator. For each simulee and item, a random number between 0 and 1 
was generated. If rj|^g was greater than this random number, an item score u^g = 

1 was assigned for the response of simulee 1^ to item £. If rj|^g was less than 

the random number, an item score Uj^^g = 0 was assigned to the item for the simu- 



- 254 - 



lee* In this manner , each of the 1,700 slmulees received an Item score of 0 or 
1 on each of :he 150 Items for each factor structure* 

Adaptive Testing Strategy 

The sets of dlchotomous Item responses u^g generated from the factor struc- 
tures with varying degrees of raultldlmenslonallty were used with a maximum In- 
formation adaptive testing strategy to obtain 9 estimates* Since the adaptive 
testing strategy used assumes a unldlmenslonal set of Item responses, the ob- 
tained 6 estimates can be used to determine the effect of violation of the as- 
sumption of unldlmenslonallty* For each factor structure: 

1* e was set to 0.0 for each slmulee* 

2. Information at 9 was computed for each of the 150 Items using first factor 
£, b^, and £ parameters In the following equation: 

1,(8) ■ [P'(9)]'/P,(8)Qg«) (4, 

where Ig(6) = Information at § for Item 

Pg(9) " probability of a correct response to Item at 6 , 
P^(e) first derivative of Pg(e), and 
Qg(e) « 1 - Pg(e)- 

3* The Item with the highest level of Information at § was selected as the next 
Item to be administered. 

4* The Item responses to the Item chosen to be administered were read from the 
generated item response matrix for each slmulee* 



5* A new 6 was calculated for each slmulee using maximum likelihood scoring: 



L(9,Ju^) =!5p (e/^gQ,g(e,)^""i8 15] 



where L(9iilu£) " likelihood of the simulee's observed response pattern (u.) 

at e^Lj, ^ 

^Ig^^l^ " probability of a correct response to item ^ for slmulee 1^ 
with trait level estimate 6^, 

u^g » 1 for a correct response to item 
= 0 for an Incorrect response to item 
Qig(ei) « 1 - Pig(ei), and 

K « the number of items administered. 
The value or 6 which had the greatest likelihood for the observed item re- 
sponses was selected as the new 6 estimate for a slmulee (8 was restricted 
to the range +4 to -4)* 
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6. Steps 2 through 5 were repeated using the new 9s for each simulee until 30 
items were administered; 

?• The es were saved at 5, 10, 15, 20, 15 and 30 items. 

Evaluative Indices 

Conditional indices. Since no one optimal evaluative index was available, 
four different evaluative indices were used to determine the effect of viola- 
tions of the assumption of unidimensionality in adaptive testing. Each of the 
following four indices were computed at each of the 17 9 levels on the first 
factor and for all six test lengths. 

1. Bias: 

Bias(9^,) = [6] 

A 

9^ «■ estimated 6 level for simulee 1^, 

9^, « true 9 level for simulee i on factor 1, and 
pi — 

N(9pi) * number of simulees at level 2^ (usually 100, but occasionally smal- 
ler due to maximum likelihood convergence failures). 
This index takes into account both the size and direction of the difference be- 
tween true and estimated 9. 

2. Inaccuracy: 

Inaccuracy (9p J ) « i-i [7] 

Inaccuracy considers only the size, and not the direction, of the difference 
between estimated and actual 6 levels for each simulee at a given 6 level 
and test length. 

3. Root Mean Square Error (KMSE). RMSE was calculated as 

■ ih' (^1 - \ 

RMSEO^,) - H^-^^ —\ [8] 



This index gives more weight to larger differences between estimated and 
true 9 levels. 

4. Efficiency. Efficiency was defined by 
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K 

E I *(ei) 

i(ei) = [?] 

where g* Indexes items actually administered and £ indexes the items with 
the maximum levels of information at 0^. 

Thus, efficiency is the ratio of the information in the k items actually 
administered to the k most informative items at 6^, It will equal KO when 
the adaptive testing strategy administers the k items with maximum informa- 
tion at Oj. Deviations from 1.0 result from the fact that, at any stage of 
the adaptive test, 6 is not usually exactly equal to 9j. 

Comparison of conditional multidimensional and unidimensional results. To 
summarize the effects of multidimensionality on each of the evaluative indices, 
distance measures were computed across the 17 9j levels between the values of 

each of the conditional evaluative indices for the unidimensional (UD) datasets 
and the multidimensional (MD) datssets for all six test lengths. Cronbach and 
Gleser's (1953) formulas were used for computing a distance measure, D"^, between 
two profiles and for decomposing D^ into components due to mean differences, 
scatter differences, and shape differences. Profiles were plots of the values 
of an evaluative index for a given dataset and test length across all 17 9j lev- 
els. The formulas used were: 

2 

where Duj)^j^ = overall squared distance between profile UD and profile MD, 

^pUD value of the evaluative index for dataset UD and 6 level £, and 

^pMD ~ value of the evaluative index for dataset MD and 9 level j^. 

°Sd,HD = °UD,MD - ^7^^'^^UD,MD^ [11] 

where D^^ ^ « squared distance between profiles UD and MD after differences in 
2 • mean level between the two profiles are eliminated. 
^ ^^UD,MD ^ squared difference m mean level between profiles UD and MD, and 

2' 2 
d2" - %D,MD " ^ ^UD.MD 

"^'^ ^ Wmd ^ ^ 

2" 

where D^^ = squared distance between profiles UD and MD after differences due 
' ' to mean level and scatter between the two profiles are eliminated 
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/l7 _ \h j^_3^ 



where S^^^ « scatter for profile UD, 

Xyjj a mean of the 17 values of the evaluative index for profile UD, 

and are defined similarly, and 

^^^fjTs xrr. " squared difference between scatters for profiles hD and UD* 
UUy MD 

The presence of scatters less than 1#00 for many of the datasets resulted 
in values of D^" that were larger than the values of D^ for the same profiles* 
This made interpretation of the distance measures difficult, so the values of 
each of the four evaluative indices at each of the 17 6 levels were multiplied 
by 10» This fact should be taken into account in interpreting the magnitude of 
the differences between profiles and the distance measures* 

To aid in interpreting the differences in profiles due to level, scatter 
and shape, the proportion of the squared distance (D^) due to each of these com- 
ponents was computed using the following formulas: 

2 2' 

1 *. ^UD^ " ^UD.MD riAi 

Level Effect,,-, ™ =» * > li^J 

UD,MD -^2 

UD,MD 

the proportion of D^ due to differences in level between profiles UD and MD, 



scatter E££e«„„_^ - "" ■ '^ • 115! 

UD,MD 



the proportion of D^ due to differences in scatter between the two profiles, and 

^UD.M D nAi 
Shape Effectyj^^^ = ^ ^ [16] 

UD,MD 

the proportion of D^ due to differences in shape between profiles MD and UD* 

Unconditional indices * In addition to examining the bias, inaccuracy, and 
RMSE conditional on 6 level, mean values of these indices were computed across 
the 17 8 levels for each dataset and test length* Also computed for each condi- 
tion was the fidelity correlation between 6 and 9^ These correlations were 

computed for a normally distributed sample of 630 simulees selected from the 
1,700 rectangularly distributed simulees in each dataset* 
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RESULTS ^ 
Unconditional Indices 

Fidelity 

Table 3 shows fidelity correlations for each of the datasets based on OSAF, 
1.5 OSAF> and 2»0 OSAF, as a function of test length. For the single-factor 
Dataset 1, flde!.ity Increased with Increasing test length from .546, when 5 
Items were administered, to .928 at 30 Items. For the 2-factor datasets (2-8) 
fidelity generally decreased with Increasing strength of the second factor with 
two exceptions: (I) Dataset 5, which had a second factor 1/2 as strong as the 
first factor, had consistently higher fidelity than Dataset 4, in which the sec- 
ond factor was only 1/3 as strong as the first; and (2) Dataset 6, in which the 
second factor was 2/3 the strength of the first, had consistently lower fidelity 
than Dataset 7, in which the second factor was slightly stronger (3/4 of the 
first). In both these cases, differences between the fidelities decreased with 
Increases in test length. For all datasets, fidelity increased with increasing 
test length. 

For these 2-factor datasets, multldimensionality had fairly substantial 
effects on fidelity. For example, at the 15-itera test length fidelity was .872 
for the single-factor Dataset 1, but dropped to .548 when there were two equal 
factors (Dataset 8). When the second factor was only 1/4 the strength of the 
first factor (Dataset 3), fidelity for a 15-item test decreased from .872 to 
.784. To overcome the effect of this degree of multldimensionality, the 15-item 
test of Dataset 3 would need to be doubled in length, resulting in a fidelity of ^ 
.880. For degrees of multldimerisionality beyond those represented by Dataset 3, 
tests would need to be well beyond 30 items in length to equal the fidelity of 
the 15-ltem test in UD Dataset 1. 

A similar pattern of results was observed for the 3-factor structures 
(Datasets 9-16), but the effects of multldimensionality on fidelity were even 
stronger. In these datasets there was, again, a general decrease in fidelity 
'd.th increasing strength of the second and third factors. Fidelity also in- 
creased with test length for all datasets. In general, however, fidelities were 
lower for the 3-factor datasets than for those with two factors, even when the 
total variance accounted for by factors beyond the first was equal. For exam- 
ple, at the 15-ltem length, fidelity for Dataset 13 (with factors 2 and 3 each 
1/2 of the first factor in strength) was .443; when the same amount of variance 
was concentrated in only the second factor (Dataset 8), fidelity was .548. Only 
Dataset 9, with second and third factors each 1/8 of the first factor, attained 
a sufficiently high fidelity at 30 items (.869) to approximate that of UD Data- 
set 1 at 15 items (.872). 

Results for the 1.5 and 2.0 OSAF datasets were similar to those for 1.0 
OSAF, with a general increase in fidelity with increasing strength of the first 
factor. For example, for a 15-item test based on a 2-factor structure with the 
second factor 3/4 the strength of the first factor, fidelity was .628 for 1.0 
OSAF (Dataset 7), .685 for 1.5 OSAF (Dataset 24), and .789 for 2.0 OSAF (Dataset 
40). For the 3-factor datasets with the second and third factors each 1/3 of 
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Table 3 

Fidelity as a Function of Test Length for 
Unidimensional (UD) and Multidimensional Datasets 
Baved on First Factors 1»0» I.5» and 2*0 Tines as 
Strong as the ASVAB General Science Factor 



No« of Test Length (Nunber of Iteias) 

Dataaet Factors 5 10 15 20 25 30 



1.0 OSAF 



1 


(UD) 


1 


•646 


• 799 


• 872 


•903 


• 914 


.928 


2 




2 


• 592 


.762 


.823 


.866 


• 896 


>909 






2 


• 519 


• 692 


.784 


• 833 


.863 


.880 


4 




2 


• 461 


• 592 


• 672 


• 718 


.765 


.790 


5 




2 


• 534 


• 648 


.711 


• 780 


• 813 


• 826 


6 




2 


• 404 


.543 


• 616 


• 658 


• 677 


• 705 


7 




2 


•431 


• 572 


• 628 


• 662 


.694 


• 715 


8 




2 


• 429 


.510 


• 548 


• 580 


.616 


• 631 


9 






• 522 


• 665 


.760 


.821 


•847 


.869 


10 




3 


• 423 


.567 


.655 


.706 


.737 


• 763 


11 




3 


.375 


• 477 


>567 


.633 


• 678 


.710 


12 




3 


• 340 


.467 


.559 


• 614 


• 652 


• 679 


13 




3 


• 320 


.386 


.443 


• 499 


• 548 


.584 


14 




3 


• 350 


.467 


.529 


• 574 


.618 


• 645 


15 




3 


•313 


•383 


.418 


.434 


w473 


• 490 


16 




3 


• 267 


.339 


• 371 


.400 


• 415 


.438 


17 






• 577 


• 723 


•80?. 


.847 


• 871 


.893 


.5 


OSAF 
















18 


(UD) 


I 


• 691 


.842 


.916 


.937 


.949 


.955 


19 




2 


• 660 


.822 


• 881 


• 912 


.931 


• 945 


20 




2 


.587 


• 753 


.848 


.892 


• 914 


• 924 


21 




2 


.560 


.740 


.828 


.877 


• 904 


• 912 


22 




2 


.569 


.737 


• 808 


.842 


.867 


.878 


23 




2 


• 462 


• 616 


• 724 


.772 


.802 


.812 


24 




2 


• 478 


.623 


• 685 


.713 


• 748 


.763 


25 




2 


• 387 


• 510 


.607 


• 651 


.675 


.697 


26 






.590 


• 740 


.816 


• 863 


.892 


• 911 


27 






.446 


• 596 


.702 


• 752 


.782 


.801 


28 






•439 


• 569 


• 654 


.710 


• 755 


.776 


29 






• 442 


.578 


• 650 


• 702 


.742 


.759 


30 






•447 


• 554 


.637 


.695 


.731 


• 745 


31 






.455 


.589 


• 690 


• 732 


• 756 


.771 


32 






.415 


.525 


.610 


.653 


.681 


• 700 


33 






.581 


.765 


.858 


• 892 


.918 


.932 


..0 


OSAF 
















34 (UD) 


1 


.733 


• 867 


.930 


• 953 


• 961 


.965 


35 




2 


.585 


.775 


.888 


• 932 


.955 


.964 


36 




2 


• 599 


• 749 


•850 


.911 


.927 


• 937 


37 




2 


.524 


• 694 


• 817 


.860 


• 902 


• 924 


38 




2 


.604 


• 7!0 


• 807 


• 853 


.866 


.888 


39 




2 


.547 


• 655 


• 756 


• 816 


• 843 


.849 


40 




2 


.542 


• 689 


• 789 


.813 


• 836 


.844 


41 






• 519 


• 690 


.804 


.875 


• 923 


.929 


42 






• 542 


• 647 


• 744 


.813 


• 841 


• 868 


43 






•499 


.631 


• •58 


• 831 


.855 


• 874 


44 






• 534 


.674 


• 777 


.819 


• 840 


• 857 


45 






• 379 


• 482 


• 546 


• 618 


• 664 


.700 
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the first factor, fidelity for a 15~item test in the 1.0 OSAF data was .567 
(Dataset 11), rising to .654 when factor 1 was 1.5 OSAF (Dataset 28) and to .758 
with 2.0 OSAF (Dataset 43). As in the 1.0 OSAF data, a single factor beyond the 
first had less effect on fidelity than did two factors equaling the strength of 
the single factor, though the effect diminished substantially with the stronger 
first factor. For example, in the 1.5 OSAF structures for a 15-item test with a 
second factor 2/3 of the first factor (Dataset 23), fidelity was .724 versus 
.654 when there were two factors beyond the first, each comprising 1/3 of the 
first factor (Dataset 28); comparable factor structures with 2.0 OSAF resulted 
in fidelities of .756 (Dataset 39) and .75? (Dataset 43). 

Datasets 17, 33, and 45 provide result! based on factors derived from the 
ASVAB 4-factor structure, in which factors 2, 3, and 4 accounted for 22. 2X, 
13.6%, and 13.5%, respectively, of OSAF. Table 3 shows that there were rela- 
tively small effects on fidelity for the 1.0 and 1.5 OSAF datasets, particularly 
for tests of 20 or more items. For example, in Dataset 17 fidelity for a 25- 
item test was .871 versus .914 for UD Dataset i. Comparable results for the 1.5 
OSAF data were .918 (Dataset 33) and .949 (Dataset 18). In the 2.0 OSAF data, 
however, the 4-f actor ASVAB structure (Dataset 45) resulted in the lowest ob- 
served fidelities for those datasets; fidelity dropped from .953 (UD Dataset 34) 
to .618 for ASVAB at 20 items, and from .965 to .700 at 30 items. 

Bias, Inaccuracy » RMSE 

Table 4 provides data on mean bias, inaccuracy, and RMSE for the datasets 
based on 1.0 OSAF. For UD Dataset 1, bias decreased from .282 at 5 items to 
.010 at 30 items. Each of the 2-f actor datasets (2-8) showed lower levels of 
positive bias and higher levels of negative bias than did Dataset 1, with bias 
becoming increasingly negative as the strength of the second factor increased. 
Thus, in 2-f actor data structures 8 underestimated 9, cn the average, as both 
test length and strength of multidimensionality increased. A similar trend was 
observed for most of the 3-factor datasets (9-16), with a few exceptions. In 
these datasets bias tended to become less positive and increasingly negative for 
all test lengths for Datasets 9-12, in which the sum of the variance accounted 
for by the second and third factors was less than that of the first factor. In 
Dataset 13, which had second and third factors each 1/2 of the first factor, 
bias was again positive for tests of 15 items or less, but this effect was 
reversed for Dataset 14 (factor 2 = 2/3 of factor 1, and factor 3 = 1/3 of fac- 
tor J). However, for tests of 5 or 10 items, bias then again became positive 
for Datasets 15 and 16, which had very strong second and third factors. There 
was also a slight trend toward positive mean bias in Dataset 16. As Table 4 
also shows, there was a slight effect on bias when data were generated from the 
4-factor ASVAB structure (Dataset 17). For these data the ASVAB structure re- 
sulted in a slight mean underestimation of 8 at test lengths of 20 to 30 items 
with a mean bias of .006 at 15 items compared with .038 for Dataset 1. 

Both inaccuracy and RMSE tended to increase with increasing strength of 
factors beyond the first, and to decrease with increasing test length; this held 
true for both the 2- and 3~factor datasets. An exception occurred for Dataset 
14 (factor 2 = 2/3 of factor 1, and factor 3 = 1/3 of factor 1) for both inaccu- 
racy and RMSE. For this dataset inaccuracy and RMSE values were lower than 
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Table A 

Mean Blat, Inaccuracy, and RMSE as a Function of Teat 
Length for Unldlment^ :>nal (UD) and Multldloentlonal 
Dataaets, Bated on SVAB General Science Factor 



No. of Teat Length (Number of Items) 

Dataaet Factora 5 10 15 20 25 30 



Blaa 












.015 


.010 


1 (UD) 


1 


• 282 


.107 


.038 


.024 


2 


2 


.247 


.100 


.031 


-.015 


-.028 


-.026 


3 


2 


.163 


.056 


-. 004 


-.026 


-.042 


-.051 


4 


2 


.189 


.060 


-.022 


-.052 


-.072 


-.084 


5 


2 


.164 


.065 


-.017 


-.053 


-.075 


-.085 


6 


2 


.170 


.038 


-.023 


-.070 


-.099 


-.107 


7 


2 


.023 


-.071 


-.125 


-.136 


-.147 


-.153 


8 


2 


.057 


-.026 


-.103 


-.135 


-.171 


-.190 


9 




.306 


.133 


.033 


.012 


-.020 


-.028 


10 




.113 


-.007 


-.080 


-.090 


-.101 


-.115 


11 




.173 


.007 


-.039 


-.086 


-.104 


-.115 


12 




.128 


.022 


-.046 


-.068 


-.096 


-.111 


13 




.397 


.231 


.061 


-.040 


-.089 


-.131 


14 




-.051 


-.097 


-.127 


-.174 


-.193 


-.201 


15 




.139 


.059 


-.060 


-.122 


-.171 


-.210 


16 




.379 


.240 


.101 


.006 


-.068 


-.142 


17 




.214 


.075 


.006 


-.028 


-.03/ 


-.032 



Inaccuracy 














.332 


1 (UD) 




.906 


.587 


.451 


.388 


.357 


2 


2 


.982 


."657 


.517 


.446 


.402 


.370 


3 


2 


1.055 


.713 


.570 


.493 


.447 


.413 


4 


2 


1.251 


.941 


.770 


.676 


.617 


.573 


5 


2 


1.183 


.870 


.715 


.604 


.549 


.521 


6 


2 


1.247 


.948 


.791 


.713 


.664 


.638 


7 


2 


1.308 


1.012 


.861 


.789 


.733 


.698 


8 


2 


1.387 


1.112 


.997 


.915 


.863 


.841 


9 


3 


1.146 


.781 


.603 


.512 


.455 


.418 


10 


3 


1.373 


1.027 


.848 


.731 


.661 


.612 


11 


3 


1.424 


1.100 


.915 


.801 


.725 


.675 


12 


3 


U455 


1.135 


.948 


.837 


.765 


.720 


13 


3 


1.622 


1.292 


1.113 


1.016 


.941 


.888 


14 


3 


1.549 


1.244 


1.062 


.954 


.875 


.829 


15 


3 


1.643 


1.419 


1.298 


1.200 


1.131 


1.092 


16 


3 


1.733 


1.492 


1.371 


1.280 


1.222 


1.179 


17 


4 


1.055 


.734 


.581 


.489 


.435 


.399 


RMSE 
















1 (UD) 


1 


1.211 


.785 


.603 


.514 


.461 


.425 


2 


2 


1.328 


.904 


.694 


.591 


.521 


.474 


3 


2 


1.417 


.980 


.773 


.658 


.587 


.539 


4 


2 


1.659 


1.296 


1.090 


.958 


.868 


.805 


5 


2 


1.574 


1.193 


.984 


.824 


.757 


.704 


6 


2 


1.659 


1.309 


1.116 


1.014 


.934 


.884 


7 


2 


1.734 


1.407 


1.214 


1.120 


1.050 


.999 


8 


2 


1.809 


1.498 


1.356 


1.258 


1.201 


1.162 


9 




1.539 


1.069 


.844 


.702 


.613 


.547 


10 




1.800 


1.401 


1.198 


1.043 


.948 


.882 


II 




1.855 


1.500 


1.276 


1.122 


1.021 


.950 


12 




1.897 


1.550 


1.333 


1.188 


1.094 


1.026 


13 




2.055 


1.723 


1.516 


1.393 


1.290 


1.220 


K 




1.971 


1.649 


1.447 


1.312 


1.114 


1.157 


15 




2.095 


1.865 


1.726 


1.616 


1.538 


1.488 


16 




2.179 


1.940 


1.809 


1.712 


1.639 


1.588 


17 




1.430 


1.005 


.797 


.648 


.572 


.519 
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those for Dataset 13, in which the amount of variance accounted for by factors 2 
and 3 was the same as in Dataset 14, but the factors were of equal strength; 
there was a trend for the difference between inaccuracies for the two datasets 
to increase as test length increased, with Dataset 14 resulting in lower mean 
inaccuracy. 

As in the bias data, small effects on inaccuracy and RMSE were observed for 
the 4-f actor ASVAB structure (Dataset 17). Both inaccuracy and RMSE decreased 
with increasing test length. For a 15-itera test, inaccuracy was .581 for Data- 
set 17 versus .451 for Dataset 1; corresponding RMSE values were .797 and .603. 

Although not shown here, similar trends for bias, inaccuracy, and RMSE were 
observed in the 1.5 and 2.0 OSAF datasets. That is, mean bias became increas- 
ingly negative with increasing raultidimensionality and test length, whereas mean 
inaccuracy and RMSE tended to decrease with those variables. In general, howev- 
er, the magnitudes of the evaluative indices were lower, indicating less effect 
of multidimensionality with a stronger first factor. 

Conditional Indices 

Effect of Test Length 

Bias . Figures la through Ic show values of mean bias at each of 17 9 lev- 
els. Each figure compares the mean bias level across four different test 
lengths (10, 15, 20, and 25 items) for datasets derived from a 2-f actor struc- 
ture with the second factor 1/3 as strong as the first factor. The first factor 
for the datasets in Figure la was OSAF; in Figure lb the first factor was 1.5 
OSAF; and in Figure Ic it was 2«0 OSAF. 

In each of these three figures the mean bias level generally decreases with 
increasing test length. This pattern is disrupted somewhat between 9 levels of 
-.80 to +.80, where the bias fluctuates around 0.0 and no test length consist- 
ently shows a smaller mean bias level. Bias is most variable for the 10-item 
test length and least variable for the 25-item test. Regardless of the strength 
of the first factor^ the mean bias values at 9 levels greater than .80 converge 
for all four test lengths. Similar patterns of bias across test lengths were 
observed for the other datasets. In general, bias was negative for 9s below the 
mean and positive for 93 above the mean, although this effect was much less pro- 
nounced for the 1.5 OSAF datasets (Figure lb) than for the 1.0 or 2.0 OSAF data- 
sets (Figures la and Ic). 

Inaccuracy . Figure 2 compares the mean inaccuracy levels at each of four 
different test lengths (10, 15, 20, and 25 items) across all 17 9 levels for 
Dataset 29, in which the first factor is 1.5 OSAF, the second factor is 1/2 as 
strong as the first factor, and the third factor is 1/4 as strong as 'he first 
factor. Inaccuracy tended to decrease with increasing test length. Inaccuracy 
levels for the 10-item test length varied across 9 levels and were most constant 
for the 25-item test. This same pattern held for the comparisons across test 
length of the mean inaccuracy values for each of the 45 datasets. 

RMSE. Comparison of the conditional RMSE values for the same dataset 



16 



- 263 - 



Figure 1 

Conditional Bias of 6 Estimates for Tests of 10, 15, 20, and 25 Items 
for Datasets with Factor 1 of 1.0, 1.5 and 2.5 OSAF 
and Factor 2 One-Third the Strength o.f Factor 1 
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(b) 1.5 OSAF (Dataset 21) 
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(c) 2.0 OSAF (Dataset 37) 
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Figure 2 

Conditional Inaccuracy of 6 Estimates for Tests of 10, 15, 20, 
and 25 Items for Dataset 29 
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Conditional RMSE of 6 Estimates for Tests of 10, 15, 20, and 
25 Items for Dataset 4 
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across different test lengths yielded the same results as for Inaccuracy. An 
example Is shown In Figure 3 for Dataset 4. RMSE decreases with Increasing test 
length, and the RMSE values for the shorter test lengths (10 and 15 Items) vary 
more across 6 levels than those for the longer tests. 

Efficiency . Comparison of the mean efficiency levels for a given dataset 
across 9 levels for a number of test lengths Indicated that the efficiency lev- 
els Increased and followed the same pattern, as test length Increased. Figure 4 
provides an example of these comparisons for Dataset 29 at 10-, 15-, 20-, and 
25-ltem test lengths. 

Since the results for all four conditional Indices showed relatively sys- 
tematic trends as a function of test length, the remainder of the results re- 
ported are only for the 15-1 tem test length. 

Effect of Multldlmenslonallty 

Tables 5 through 8 contain values of the distance measures across 17 9 lev- 
els for conditional values of each of the evaluative Indices between each UD 
dataset and each of the MD datasets with the same strength first factor, for 
tests of 15 Items In length. These tables also contain the proportions of the 
distance measure due to level, scatter, and shape effects* 

Bias. Table 5 shows results of the profile analysis for bias. For the 
datasets based on OSAF (Datasets 1-17), the UD dataset (Dataset 1) generally had 
a higher mean bias (.38) and a lower variability (scatter) of bias (2.60) than 
did the MD datasets (2-17). When a second factor was added to the data (Data- 
sets 2-8), D^ values tended to Increase with Increasing strength of the second 
factor; the exception to this Is Dataset 5, In which D^ values were uniformly 
lower than In Dataset 4 even though the second factor In Dataset 5 was stronger. 
The effect proportions show that In all these datasets the vast majority of the 
differences In bias values as a result of multldlmenslonallty was due to In- 
creased scatter; In Datasets 2-8 at least 87% of the differences In bias values 
from the UD dataset was due to scatter. Level effects accounted for most of the 
remaining effect for most of these datasets, with the exception of Dataset 2, In 
which the shape effect was slightly stronger than the level effect. 

t Similar results were observed for the 2-factor structure In which the first 

factor was strengthened. For Datasets 19-25, based on 1.5 OSAF, overall D val- 
ues Increased regularly with Increasing multldlmenslonallty, but the absolute 
values of D^ were smaller than for the 1.0 OSAF data. For Datasets 35-40 a sim- 
ilar but more Irregular trend Is evident, with smaller values of D than for 1.0 
OSAF or 1.5 OSAF, particularly for the higher strength second factors (Datasets 
37-40). The effect proportions for these datasets are similar to those for the 
1.0 OSAF data, though there Is a tendency for multldlmenslonallty to result In 
slightly greater differences In level, with consequent reductions In the scatter 
effect. 

Figure 5 shows a typical result for bias with increasing multldlmenslonal- 
lty for the 1.5 OSAF data. (The values plotted in this figure and in the other 
figures following are the un transformed values, so that the means and scatters 
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Figure 4 

Conditional Efficiency of 9 Eatimateo for Tests of 10, 15, 20, 
and 25 Itema for Dataset 29 
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Figure 5 

Conditional Bias of 6 Estimates for Datasets 18, 21, 23, and 25 
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Table 5 

Elevation (Mean) and Scatter of Bias (x 10) for Unidlmenslonal (UD) and 
Multidimensional Datascts, Differences Between Elevation and Scatter, 
Total Index. with Elevation Removed (D^'), Ir with Elevation and 
Scatter Removed (D^"), and Proportion of Due to Level, Scatter, and Shape, 

for Tests of 15 Items 



Difference 



Dataset 


Mean 


Scatter 


Bet 


ween 




D ' 


D ' ' 


Effect Proportion 


Means 


Scatter 


D 


Level 


Scatter 


Shape 


1 (UD) 


• 38 


2.60 


















2 


• 31 


3.76 


.07 


-1.17 


25.524 


25.439 


2.468 


.003 


AAA 

.900 




3 


-•03 


5. 11 


.41 


-2.51 


39.584 


36. 733 


2.295 


.072 


O "TA 

.0/0 


A C O 


4 


-• 20 


13.57 


.58 


-10.98 


212.407 


206. 674 


tic 


.027 


.962 


.012 


5 


-•14 


9.56 


.52 


-6.96 


110. 142 


105.592 


2»304 


.041 


.938 


.021 


6 


-•22 


14.13 


.60 


-11.53 


225.007 


218.948 


2.345 


.027 


.963 


.010 


7 


-K19 


18.04 


1.57 


-15.44 


388.634 


346.573 


2.309 


. 108 


. 886 


.006 


8 


-•98 


17.97 


1.36 


-15.37 


378.507 


346.969 


2.371 


.083 


.910 


.006 


9 


• 34 


5.01 


.04 


-2.42 


29.444 


29.419 


1.811 


.001 


.938 


.062 


10 


-•79 


12.S9 


1.17 


-10.00 


206.466 


183.387 


2.552 


.112 


.876 


.012 


11 


-•37 


18.7.1 


.75 


-15.72 


354.696 


345. 152 


2.063 


.027 


.967 


.006 


12 


-•43 


20r52 


.81 


-17.93 


438.638 


427.464 


1.990 


.025 


.970 


.005 
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are 1/10 of the comparable values In Tables 5 through 8.) This figure shows the 
effect of the strength of the second factor increasing from 1/3 of the first 
factor (Dataset 21) to 2/3 (Dataset 23) to KO (Dataset 25). Bias for the UD 
dataset (Dataset 18) is close to zero throughout the 9 range* For the MD data- 
sets bias is close to zero for 9 values close to 0»0, but it increases as the 
levels progress toward either extr me, resulting in the increased scatter due to 
Increasing multidimensionallty • Bias values are generally positive for 9 values 
less than 0,0 and negative for 9 values greater than O.O* For Dataset 21 with 
the smallest second factor (V3) bias is not substantially different from the UD 
dataset, except at extreme 9 values; the major effect on bias for these datasets 
seems to occur for Dataset 23 (factor 2 » 2/3), with the additional 1/3 added to 
factor 2 in Dataset 25 resulting in generally little additional bias. 

Results for the 3-factor datasets (9-16, 26-32, and 41-44) are also in 
Table 5. For the 1.0 OSAF data, overall D^ increased regularly with increasing 
strength of the second and third factors; for the 1*5 OSAF data, values of D^ 
were considerably lower, indicating less effect of increased strength of the 
second and third factors with the stronger first factor; this trend is further 
supported by Datasets 41-44 (2*0 OSAF), in which overall D^ values were the low- 
est for all the 3-factor datasets* For all but one of the 3-factor datasets 
over 90% of the difference in bias values between the UD and MD datasets was due 
to scatter (the exception being Dataset 10 with .876), with secondary effects 
generally attributable to level effects. 

Increasing dimensionality from two to three factors while holding constant 
total proportion of variance accounted for by the factors resulted in increased 
scatter of bias In most cases. For example, Dataset 6 was a 2-factor structure 
with the second factor 2/3 of the first, whereas in Dataset 11 both the second 
and third factors were 1/3 of the first factor. For Dataset 6 overall D^ was 
225, ^whereas Dataset 11 obtained a D^ value of 355; in both cases the proportion 
of D due to scatter was about .96. A similar effect was observed with the 1.5 
OSAF data — overall D for Dataset 23 was 155, whereas that for Dataset 28 was 
229. The 2.0 OSAF data did not, however, exhibit this effect since overall D^ 
for Datasets 39 and 43 were 138 and 139, respectively. 

When results from the ASVAB 4-factor structure were compared to those of 
the relevant UD datasets, very minor effects on bias were observed when OSAF was 
used (Dataset 17) or when the first factor was increased to 1.5 its original 
strength (Dataset 33). In both cases mean bias was lower for the ASVAB struc- 
ture than for the UD structure, though the scatter of the bias was slightly 
higher. The minor differences in bias for these datasets were, like the other 
MD structures, primarily due to scatter (.839 for Dataset 17 and .899 for Data- 
set 33). In contrast to the other MD structures, however, secondary effects 
were more important for shape than for level, indicating that the ASVAB struc- 
ture changed the ordering of bias values across the 17 9 levels in comparison to 
the datasets. However, since there were very small effects on bias due to the 
ASVAB structure (overall D^ values of 23 and 12), the shape effects are likely 
not important. 

Using the ASVAB structure with the 2.0 OSAF data (Dataset 45) resulted in 
the largest overall D^ for Datasets 35-45, a result considerably different than 
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that observed for Datasets 17 and 22* These data Indicate that bias Increased 
substantially both in overall level and variability from the comparable UD data- 
set, with 86% of the differences in bias due to scatter and 14% due to level. 
Since factors 2-4 were the same in all three ASVAB datasets, this difference can 
be attributed only to the increased absolute strength of the first factor in 
Dataset 45. 

Inaccuracy . Table 6 contains the dista^n.e measures computed between the 
inaccuracy profiles of the UD datasets and each of the MD datasets with the same 
strength first factor. For the 2-factor structures overall, generally in- 
creases with increasing strength of the second factor in both the datasets based 
on 1.0 OSAF (2-8) and those based on 1.5 OSA? (19-25), with a similar but more 
irregular trend in the datasets based on 2.0 OSAF. As for the bias criterion, 
the value of tends to decrease as the strength of the first factor increa- 
ses—even though the relative strength of the second factor is the same — indi- 
cating less effect on inaccuracy as the strength of the first factor increases. 
The effect proportions for these data show that differences in inaccuracy values 
were primarily the result of level effects that tended to increase with in- 
creased strength of the second factor. This increasing level effect occurred at 
the expense primarily of the scatter effect which, with a few exceptions, tended 
to decrease with increasing strength of the second factor. The only exception 
to the predominance of the level effect occurred when the second factor was 1/8 
as strong as the first factor in Dataset 2, in which case the scatter effect was 
.547 and the level effect wars .382; in the comparable datasets (19 and 35) with 
similar strength second factors but stronger first factors, the scatter effect 
was also relatively l&rge. However, in all three of these datasets, D^ was rel- 
atively small, indicating little effect on inaccuracy with a weak second factor. 

A similar pattern was observed for the 3-factor structures (Datasets 9-16, 
26-32, and 41-*44). D^ tended to increase with increasing strength of the second 
and third factors, although the trend was more irregular for the 1.5 and 2.0 
OSAF data. In all cases level accounted for a minimum of S6X of the squared 
difference between inaccuracy values for the UD and MD datasets. There was also 
a marked tendency for the effect of the second and third factors to diminish 
substantially as the first factor increased in strength. For example, in Data- 
set 11 based on 1.0 OSAF and second and third factors each 1/3 as strong as the 
first factor, D^ was 382 with 96% due to level; in Dataset 28 based on 1.5 OSAF 
D^ was 326 with 98% due to level, and in Dataset 43 based on 2.0 OSAF D^ wcs 141 
with 88% due to level. 

When the number of factors was increased from 2 to 3 while holding constant 
the proportion of variance accounted for by factors beyond the first, D tended 
to increase, indicating a greater effect on inaccuracy for a larger number of 
factors. For example, in the 1.0 OSAF data, D^ for Dataset 5 (2 factors, second 
factor 1/2 of first factor) was 130, whereas in Dataset 10 (3 factors, second 
and third factors each 1/4 of first factor) D^ was 288; similar effects were 
observed in the U5 OSAF data for Datasets 22 versus 27 (D^ « 91 vs. 275) and in 
the 2.0 OSAF data for Datasets 38 and 42 (D « 71 vs. 98). Figure 6 illustrates 
the typical level effect found for inaccuracy within the 1.0 OSAF data. Dataset 
3 with a weak (1/4) second factor results in inaccuracy values close to UD Data- 
set 1, whereas inaccuracy increases for Dataset 10 with two factors each 1/4 of 
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Table 6 

Elevation (Mean) and Scatter of Inaccuracy (x 10) for Unidimensional (UD) and 
Multidimensional DatasetSy Differences Between Elevation and Scatter* 
Total Index. with Elevation Removed (D^*), with Elevation and 
Scatter Removed (D^"), and Proportion of Due to Level, Scatter, and Shape, 

for Tests of 15 Items 
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the first factor, and increases again in Dataset 13 as factors 2 and 3 are again 
increased to 1/2 of the first factor* 

Figure 6 

Conditional Inaccuracy of 8 Estimates for Datasets 1, 3, 10, and 13 
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The ASVAB factor structure (Datasets 17, 33, and 45) had slightly greater 
effects on overall for inaccuracy (Table 6) than it did for bias (Table |)* 
Similar to the bias data, however, the ASVAB structure resulted in lowest D for 
the K5 OSAF data (Dataset 33) and a very high value of D^ in the 2*0 OSAF data 
(Dataset 45)* For all three datasets D^ was primarily attributable to differ- 
ences in level of conditional inaccuracy, with a secondary effect due to scatter 
of the inaccuracy value s» 

RMSE> The results for RMSE, shown in Table 7, have some similarity to 
those for inaccuracy* That is, for both the i.- and 3-factCT: structures D^ gen- 
erally increased as the strength of factors beyond the first increased* In 
addition, the magnitude of decreased with increasing strength of the first 
factor, indicating that the effect of factors beyond the first factor on RHSE 
was less with a stronger first factor, even though succeeding factors were pro- 
portionally as strong* In contrast to the inaccuracy results, however, for the 
2-factor structures (Datasets 2-8, 19-25, 35-40), MD datasets resulted in RMSE 
values that were roore variable than the UD datasets, as indicated by D^ scatter 
proportions in the range of *10 to *20 for most of the 1*0 and 1*5 OSAF struc- 
tures, and above *20 for many of the 2*0 OSAF datasets (35 to 38)* With only 
one exception (Dataset 2), however, the predominant effect of multidimensional- 
ity was to increase the level of RMSE in all datasets, with the greatest level 
effects observed in the 1*5 OSAF data* 
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Table 7 

Elevation (Mean) and ScaCCer of RMSE (x 10) for Unldlmenslonal (UD) and 

Multidimensional Datasets, Differences Between Elevation and Scatter, 
Total Index. with Elevation Removed (D^'), with Elevation and 
Scatter Removed (D^"), and Proportion of Due to Level, Scatter, and Shape, 

for Tests of 15 Items 
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Figure 7 shows a typical example of the RMSE results. This figure displays 
RMSE values for the 1.5 OSAF UD dataset (18) and MD Datasets 22, 24, and 25, in 
which the strength of the second factor increased respectively from 1/2 to 3/4 
to 1.0 of the first factor. As can be seen, values of RMSE increased with 
increasing strength of the second factor, with only minor changes in their vari- 
ability. 

Figure 7 

Conditional RMSE of e Estimates for Datasets 18, 22, 23, and 25 
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The patterns of RMSE results for the ASVAB data structures were similar to 
those for inaccuracy. Lowest was observed for the 1.5 OSAF data (Dataset 
33), whereas highest occurred for 2*0 OSAF. Even though the ASVAB structure 
included four factors, values for the 1.0 and 1.5 OSAF structures were in the 
range of those observed for 2-factor structures with second factors 1/8 to 1/4 
those of the first factor (e.g., Datasets 2, 3, 19, 20). The ASVAB structure 
tended to result in D^ values with a higher scatter effect for the 1.0 and 1.5 
OSAF datasets, in comparison to most of the ether MD datasets, indicating more 
variability in RMSE values as a function of 6 levels than was evident in the 
corresponding UD datasets. 

Efficiency . ^- D^ values for efficiency are in Table 8. With the exception 
of Dataset 2, the predominant difference in efficiency between the MD and UD 
datasets in the 2~factor data for 1.0 OSAF (Datasets 2-8) and 1.5 OSAF (Datasets 
2-8 and Datasets 19-25) was due to level; MD structures resulted in fairly con- 
stant levels of lower efficiency in comparison to UD structures. In the 1.0 
OSAF datasets the scatter/variability oif observed efficiency values tended to 
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Table 8 

Elevation (Mean) and ScaCCer of Efficiency (x 10} for Unidimensional (UD) and 
Multidimensional Dataset^, Differences Between Elevation and Scatter^ 
Total Index. with Elevation Removed (D^*), with Elevation and 
Scatter Removed (D^**), and Proportion of Due to Level, Scatter, and Shape, 

for Tests of 15 Items 



Difference 

Between Effect Proportion 

Dataset Mean Scatter Means Scatter D^ D^ • D Level Scatter Shape 
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decrease with increasing strength of the second factor, with a somewhat more 
irregular trend observed for the comparable U5 OSAF datasets* For the 3-f actor 
structures in the 1*0 and 1.5 OSAF datasets (Datasets 9-16 and 26-32), the pre- 
dominant result was an overall reduction of efficiency values as the strength of 
the second and third factors increased* The level effect for these datasets 
tended to be in the high •80s and low .90s with a minor secondary effect due to 
scatter. In both the 1.0 and 1.5 OSAF structures, an increase from 2 to 3 fac- 
tors while maintaining the same proportion of variance in the factors beyond the 
first led to decreases in efficiency, as shown by D values of 23 for Dataset 6 
(second factor 2/3 of the first) and for Dataset 11 (second and third factors 
each 1/3 of the first). 

Figure 8 shows the typical pattern of results for the 1.0 and 1.5 OSAF 
data. The UD data structure (Dataset 18) shows a fairly flat and high pattern 
of efficiency with a mean of .811. When a second factor 1/4 the strength of the 
first factor is added in Dataset 20, mean efficiency drops to .75 with little 
change in variability or shape. Datasets 27 and 30 show strong effects on effi- 
ciency through most of the 6 range when two factors are added to the first. 
However, the strength of the second and third factors seems to have little 
effect on efficiency since factors 2 and 3 in Dataset 27 were each 1/4 of the 
first factor, vrtiereas these factors each accounted for 1/2 the variance of the 
first factor in Dataset 30^ The trend observed in Figure 8 for Datasets 27 and 
30 appeared for most of the efficiency data — there was a tendency for strong 
second and third factor structures to have a greater effect for lower 6 levels 
than for higher 6 levels. This asymmetry was not evident in the bias, inaccura- 
cy, or RMSE results. 

Figure 8 

Conditional Efficiency of 6 Estimates for Datasets 18, 20, 27, and 30 
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A different pattern of results emerged for the 2.0 OSAF data. For the UD 
Dataset 34, mean efficiency (.78) was slightly lower and its scatter higher than 
for UD Datasets 1 and 18. As the strength of the second and third factors in- 
creased, overall values increased to about the same levels as those observed 
in comparable 1.0 and 1.5 OSAF data, indicating similar overall reductions in 
efficiency; for example, in Dataset 40 (with a second factor 3/4 of the first 
factor) was 59, whereas the same structure in the 1,5 OSAF data (Dataset 24) 
resulted in a D^ of 55. The difference in the 2.0 OSAF data versus the 1.5 and 
1.0 OSAF was in the pattern of the efficiency results. Whereas in the latter 
data structures the predominant D^ effect was for level, in the 2.0 OSAF data 
the majority of the change in efficiency due to multidimensionality was due to 
scatter, with proportions ranging from .86 for Dataset 35 to .44 for Dataset 39. 

Figure 9 displays the typical pattern of results for the 2.0 OSAF data 
structures. UD Dataset 34 has the flattest and generally highest efficiency 
levels of the datasets plotted. The remainder of the datasets resulted in simi- 
lar patterns of highly variable efficiency values, all following a similar pat- 
tern and differing little, even though Dataset 36 had only two factors wir*> the 
second factor only 1/4 the strength of the first, whereas Datasets 42 and s4 
were 3-factor structures with the second and third factors combined accounting 
for 1/2 and 3/4 the variance of the first factor, respectively. For all three 
of these datasets, efficiency values for the MD structures exceeded those of the 
UD structure for 6 values in the range of -1.6 to -2.0 and above about 2.8. 
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Figure 9 

Efficiency of 6 Estimates for Datasets 34, 36, 42, and 44 
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Results for the ASVAB structures show srnall reductions in mean efficiency 
from .82 to .77 in the 1.0 OSAF data (Datasets 1 vs. 17), with a reduction in 
scatter; a similar small mean effect in the 1.5 OSAF data (.81 vs. .76); and a 
slight increase in scatter for Datasets 18 versus 33. When the first factor was 
increased to twice its original strength, addition of the three ASVAB factors 
resulted in a substantial decrease in mean efficiency and in a substantial 
increase in the variability of efficiency values; in Dataset 45 (the ASVAB 
structure) mean efficiency was .55 with scatter of .49, in comparison to values 
of .78 and .21 for the UD 2.0 OSAF structure (Dataset 34). However, for all 
three comparisons, level effects accounted for more than 70% of the differences 
between conditional efficiency levels for the ASVAB data and the comparable UD 
datasets. 



CONCLUSIONS 

As the overall degree of multidimensionality (as measured by the sum of the 
eigenvalues for each factor) in the generated item responses increased, the es- 
timated 8 values at each of the seventeen 9 levels evaluated deviated further 
from the true (first factor) 6 values. This effect was evident in the compari- 
sons of overall bias, inaccuracy, and root mean square (RMSE) values for data- 
sets with differing degree^ of multidimensionality, and in all the conditional 
indices. These comparisons showed increasing levels of each of these evaluative 
indices as the multidimensionality of the underlying factor structure increased. 
The effect was also evident in the decreased efficiencies of datasets when com- 
pared to datasets with underlying factor structures that were more unidimension- 
al. Individual 6 estimates also ordered individuals differently from the true 
values, as reflected in the fidelity correlations. The pattern of results, 
therefore, suggests that maximum information adaptive testing is sensitive to 
changes in the dimensionality of the responses. 

While all degrees of multidimensionality had effects on all the evaluative 
indices, effects were generally a function of the number of items administered. 
Thus, for the overall indices in all multidimensional datasets, fidelities in- 
creased with increasing test length, and inaccuracy and RMSE decreased, while 
overall bias tended to change from fairly high positive values for short test 
lengths to low negative values for the majority of multidimensional structures. 
For the conditional indices, very similar patterns of results were observed for 

* different test lengths, with level effects (as opposed to scatter or shape 

effects) predominant for all but the bias index. Even for conditional bias, 
however, test length effects were roughly proportional for a given 6 level. 
Consequently, while maximum information adaptive testing is affected by devia- 

» tions from unidimensionality , the data suggest that in many cases, at least for 

relatively small degrees of multidimensionality, the effects of multidimension- 
ality can be overcome simply by increasing test length. For example, the ASVAB 
factor structure resulted in a fidelity of .802 for a 15-item test compared to 
.872 for the UD case. When the multidimensional ASVAB structure was increased 
to 25 items in length, the fidelity of .871 was essentially the same as that of 
the 15-item unidimensional test. The same pattern was observed when the first 
factor of the ASVAB structure was strengthened by 50%. 
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The overall indices showed, in general, that increasing test length to 
twice the length of the multidimensional tests will overcome the effects of mul- 
tidimensionality for multidimensional structures with one or two factors beyond 
the first that account for up to one-fourth the variance of the first factor. 
This finding held regardless of the strength of the first factor. Since a simi- 
lar result was observed for the ASVAB structure (in which factors 2, 3, and 4 
accounted for 22%, 13%, and 13% of the first factor, respectively) in the 1.0 
and 1.5 OSAF data, the results suggest that the effects on maximum information 
adaptive testing of multidimensional factor structures in which up to one-third 
of the variance of the first factor appears in second and third factors, can be 
overcome by doubling adaptive test length. For degrees of multidimensionality 
beyond these levels, however, adaptive test lengths would need to be increased 
well beyond double to cvercoca the effects of multidiraensionality. This conclu- 
sion must be qualified, however, when bias of the 9 estimates is of concern, 
since the degree of bias differed at different 9 levels. 

There was some evidence to suggest that the number of factors (2 vs. 3), 
and not simply the overall strength of the underlying factor structure, affected 
9 estimates. For example, a single factor bayond the first had less effect on 
fidelity than did two factors that accounted for the same amount of variance. 
In addition, there was more scatter of conditional bias with three factors than 
with two, even though the proportion of variance in the second and third factors 
was equal in the two structures. Thus, the more complex factor structures 
seemed to affect the 9 estimates more than the simpler structures. This find- 
ing, however, did not appear to extend to the 4-factor ASVAB structures. 

Several factors affect the generality of the conclusions f^rawn from this 
research. First, the results are limited to the particular multidimensional 
model used to generate the multidimensional response vectors. Use of other mod- 
els, such as those reviewed by Reckase and McKlnley (1985), may yield different 
results. The results are also limited to maximum information adaptive testing 
with maximum likelihood scoring. Third, different factor structures might re- 
sult in different findings, since only one basic first factor was used in this 
study. Thus, the study should be replicated varying these factors to further 
evaluate the robustness of adaptive testing to deviations from the unidiroension- 
al item response theory model used to select and to score test items. 
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